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ABSTRACT 

The general two population discrimination problem is 
discussed briefly under various situations, Discrimination 
procedures using the linear discriminant function and a 
nonparametric procedure due to J. L. Hodges and E. Fix which 
classifies a random variable to a population on the basis of 
assigning it to the population which has the nearest obser- 
vation to an observed value of the random variable are 
discussed and compared by computing the probabilities of 
misclassification for both procedures when the two popu- 
lations are normal with equal covariance matrices. Proba- 
bilities of misclassification are computed for the 
nonparametric discriminator and the linear discriminant 
function for two small sample sizes for the cass when the 
two populations being discriminated are exponential, In 
this latter case, both discrimination procedures are shown 
to give high probabilities of misclassification for certain 
values of the parameters of the distribution being discrimi- 
nated. Regions are given in terms of the parameters of the 
two exponential distributions where one of the probabilities 
of error is greater than 0.5. A more complete investigation 
for larger sample sizes is recommended for the linear dis- 
criminant function and the nonparametric procedure dis- 
cussed in this paper for the case when the two popvlations 


being discriminated are exponential, 
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SECTION [ 


INTRODUCTION 


The two population discrimination prcblem may be summa- 
rized as follows: given a random variables Z distributed 
over some p-dimensional space according to a distribution F, 
or according to a distribution G, determine on the basis of 
an observation, say z of Z, which of the two distributions 
4 haso 

When F and G are completely known, the solution to the 
problem is implicit in the Neyman=Pearson lemma.(1) The 
discrimination depends on the ratio f(z) where f and g are 
the respective density functions of 3 aa Ge Thesrule vis 


as follows: 


i ei) 
e(z) 


aE ee ae 2 
-12h <6, decide in favor of G 


If f(z) 
g(z) 


>C, decide in favor of F 


= 0, the decision is arbitrary . 


C is an appropriate positive constant chosen on the 
basis of consideration relating to the importance of the two 
possibile errors: 

(i) P. = P (Z is assigned to G ie, came from F) 


L 


(ii) P,, = P (Z is assigned to P | Z came from G). 


The two most widely advocated choices of C ars: 
(a) Take C=1 


GF 


(b) Choose C such that P, = 7 





This procedure, known as the “likelihood ratio pro- 
cedurs'" is known to have optimum properties with regard to 
control of ths probability of misclassification, 

When F and G are known except for the values of one or 
more parameters, the procedure used is much the same as that 
just described, Under the assumption that F and G are known 
except for one or more parameters and if we can assume that 
Samples are available say: 

KX, gXioX, goo from F 


ab ws ee ca 
a P 2, ra om G 


we are able to estimate the unknown parameters, denoted col- 
lectively by ©. By some estimation procedure, we can esti- 
mate @ by @ and assume that Feand Ge are the correct 
distribution functions. The "likelihood ratio procedure" 
and the decision rules outlined above can now be applied. 

If it is assumed that F and G are p-variate normal 
distributions having the same (unknown) covariance matrix 
and unknown expectation vectors, the linear discriminant 
function is a good example of this procedure, (2) The given 
samples are used to estimate the covariance matrices and the 
expectation vectors and the "Likelihood ratio procedure" is 
used under the assumption that the estimatsd parameters are 
known to be correct. It is known that under the normal as-= 
sumption for F and G and the homoscedasti¢c assumption that 


the linear discriminant function is an optimal procedure, 


e 





Although this procedure seems reasonable when the 
parametric form of the distributions is correct or the as- 
sumed form is correct, there is concern about the validity 
of this procedure when the linear discriminant function is 
used with data not normal, or if normal, with unequal 
covariance matrices. In fact in the normal situation when 
the covariance matrices are not equal, a quadratic function 
can be shown to be optimal. There is a need then for a 
reasonable discrimination procedure whose validity doéss not 
require the knowledge implied by the normality assumption, 
the homoscedastic assumption or any assumption about the 
parametric form. 

Several classes of nonparametric discrimination proe 
cedures were proposed in (3), These procedures were proven 
to have asymptotic optimum properties for large samples. 

In (4), some of these nonparametric procedures were investi- 
gated when the samples were small. These procedures were 
compared with the linear discriminant function where F and 

G were assumed normal with equal covariance matricés since 
under these assumptions the Linear discriminant function is 
known to be optimal. A comparison was made by comparing the 
probabilities of misclassification when the linear discriminant 
function was used against the probabilities of misclassifi- 
cation when the nonparametric procedures were used. A 


survey of the procedures and results of (4) are given in 





Section II of this paper. 

In Section III of this paper, an investigation is made 
of the performance of one of the nonparametric discriminators 
discussed in (4) and of the performancs of the linear 
discriminant function when F and G are not normal but, in 
fact, exponential with parameters X and [4 respectively. 

The exponential distribution was selected because of the role 
it plays in the field of life testing, and other applied 
problems. It is shown that for sample sizes of 1 and 2, 

that both the nonparametric discriminator and the linear 
discriminant function give very poor results for certain 
values of )\ and [/ . 

Detailed conclusions and recommendations made on the 
basis of the results attained in Sections II and III are 
contained in Section IV of this paper. 

Professors R. R. Read and J. R. Borsting, of the U. S. 
Naval Postgraduate School, have generously given their time 
to provide direction, encouragement and valuable advice to 


the author in the writing of this paper. 





SECTION II 
PERFORMANCE OF THE LINEAR DISCRIMINANT FUNCTION 
AND A GLASS OF NONPARAMETRIC DISCRIMINATORS 
WHEN THE TWO POPULATIONS BEING DISCRIMINATED 
HAVE NORMAL DISTRIBUTIONS WITH 


EQUAL COVARIANCE MATRICES 


Let X X so00yX be a sample from a p-variate distri- 
m 


Ll” 2 
bution F and let Yy 9 Yo9ccest, be a sample from a p-variate 
distribution G. It is assumed further that the parametric 
forms of F and G are unknown. If zZ is an observation of a 
random variable Z known to be either distributed as F or G, 
how is it decided on the basis of z to which population Z 
belongs? Define a distance function (in p-dimensional space) 
which will permit a ranking of the m+n observations ac- 
cording to their "nearness" to z, The meee of the discrimi- 
nation procedures outlined in (3) is to assign Z to the 
population which has the most observations nearest to Zz. 
Specifically, choose an odd integer, k, and assume for sime 
plicity that m=n, then Z is assigned to the distribution 
from which came the majority of the k nearest observations, 
In (3), it was shown that several classes of these non-= 
parametric discriminators have asymptotically optimum per- 
formance as m->OO and n—0OO at the same rate. By optimun 
performancs, it is meant that the probabilities of misclassifi- 


cation P. and P as defined in the introduction, tend to 


Ng 
2 





the theoretical minimum values which they could have if F 
and G were completely known. 

The asymptotic properties and the simplicity of ap- 
plying the procedures of this class of nonparametric dis- 
criminators sugsest that this type of procedures might be a 
reasonable alternative to the commonly applied linear dise 
criminant function, However, to propose an alternative to 
the the linear discriminant function solely on the basis of 
asymptotic properties and ease of application would not be 
entirely reasonable. In particular, the small sample per- 
formance of such nonparametric discriminators needs investi- 
gation to ascertain how much discrimination power is Lost 
when F and G are known to be normal with equal covariance 
matrices so that the Linear discriminant function is ap- 
propriate, One way this investigation can be accomplished 
is by comparing the probabilities of misclassification when 
the linear discriminant function is used with the corre- 
sponding probabilities of misclassification when the non- 
parametric discriminators are ussd. Such an investigation 
was made in (4). The remainder of Section II is dsvoted to 
summarizing the procedures and resuits of (4). 

It is first pointed out that the problem can be reduced 
considerably by considering linear transformations in the 
observation space. It is always possible by such transfore- 


mations to insure that F and G will have the identity 





covariance matrix, In other words, in th? new spar= the p 
Lransformsd measurements are indspéndent in s4.cn populstion 
and each measurement has a unit varianc+., I’ ta also possi 
ble by such transformations to put th= expe tation vector of 
the F population at the origin and the expectation vector of 


v 


the G population on the positive firs* axi 


rf} 


. Thie aliscws 
completes specification of the transformed popuiation by the 


two paramsters p and A where 


r 


i 


E (first coordinate sf Y) 


distance tetween the means of the 


transformed pepulations. 
In performing such linear transformations. P, and P. for the 
linear discriminant function are unchanged. The proba 
bilities P, and P_ for the nonparametric discriminators are 
Likewise unchanged since such linear transformations mep 
the totality of distance functions ons-on= into the totality 
in the new space, 

It 1s assumed that the sizes of ths samples taken from 
each population are equal, m=n. In ths main, the distance 
function used is 

iP 


A (X%5%) = Max ix, -2, 
= 


to 


It should be pointed out that /\ is just one of a@ Largs class 
of distance functions, anyone of which stould be used, This 


fact is mentioned since the probabilities P, sand P_ depend 
i c. 


=] 





very heavily on the distance function chosen. Most of the 
computations are made using k=1, that is, assign Z to the 
population F or G from which came the individual of the 
pooled samples which most closely resembles Z. 

The first case considered is the univariate case, p=l. 
Using the rule of the "nearest neighbor"; that is, k=l, and 
the distance function /\ = | x-z|, which corresponds to ordi- 
nary Euclidean distance in this case, the probabilities PY 
and P are computed for various values of n and. 

'.For p=l, the linear discriminant function is greatly 
reduced Since no matrix computation enters, The arithmetic 
mean X4+¥ of the ienie Hoan is computed and Z is assigned 
to that population whose sample mean ilies on the side of 
. X#Y as does Z itself. The probabilities of misclassifi- 
ae are now readily computed, 

From the symmetry of the problem, PLP. so it is suf- 
ficient to compute Pos thus, it is assumed that Z is distri- 
buted according to the F distribution. As was pointed out 
previously, linear transformations make it possible to put 
E(X)=0, E(y)=)>0 and I= iy with no loss of generality. 

An error is cammitted by the linear discriminant 
fUnetLON wa eand only if, 


—a <7 


(i) Zs X+¥ and Y > x 


Zz 


s% 


Kit 
A 


(ii) Ze X+¥ and 


Define U=Y-X and V= X+¥-2Z. It is easily shown that U 





very heavily on the distance function chosen. Most of the 
computations are made using k=l, that is, assign Z to the 
population F or G from which came the individual of the 
pooled samples which most closely resembles Z, 

The first case considered is the univariate case, p=l. 
Using the rule of the "nearest neighbor"; that is, k=l, and 
the distance function /\ = | x-z|,which corresponds to ordi- 
nary Euclidean distance in this case, the probabilities P 
and P, are computed for various values of n and ) . 

'.For p=l1, the linear discriminant function is greatly 
reduced since no matrix computation enters, The arithmetic 
mean X+¥ of the saints sane is computed and Z is assigned 
to that population whose sample mean lies on the side of 
X+¥ as does Z itself. The probabilities of misclassifi- 
Onn are now readily computed, 

From the symmetry of the problem, PSP, so it is suf= 
ficient to compute Pi» thus, it is assumed that Z is distri- 
buted according to the F distribution. As was pointed out 
previously, linear transformations make it possible to put 
E(X)=0, E(y)=)>0 and O= Iyal with no loss of generality. 

An error is committed by the linear discriminant 
function if and only if, 

(i) Zs X+¥ and Y> xX 
i - 
(ii) Ze X*¥ and Y < xX, 
ae _¢& . 
Define =Y-X and V= X+¥-2Z. It is easily shown that U 





and V are independent normal random vierisosle: with E(u)=), 


rad 


te te a, BVA, TF. =} + 2/n. in revms of the vetiables 
U and V, an error is committed by tre linear discriminant 
funecuLen if -and-only if UV< 0, Thus 2* SReeers Tom the 
linear discriminant function when p=]. thar. 


no Fe [2 OE AN]O-FES) OL /b- SE| 











= 
Ww 


Iw = = e ? acy 


Since Lim P = O(- *) it is observed that the maximum proba-= 
> 65 


bility of misclassification is .5. The vaiues of P_=P fer 


where 


¥ 


7 
various values of n and) are Piven win lane) Per. 
and 2 give these results graphically. All Tabies and Figures 
in Section II have been reproduced from (lh). 

We consider now the nonparametric discrininator uaing 
the "rule of the nearest neighbor," k=1, which consists of 
assigning Z to that population from which cams the sample 
individual nearest to z, Suppose that Z=z, Lat P. (z) 


denote the conditional probability that the neares* of the 


en sample observations to zisay, given Zz. Then, 


Od | eee 5* 


Ea Seay) = | oF e * EW dz (1 | 





PROBABILITY OF ERROR, LINGAR DISCRIMINANT FUNCTION, 


al. 
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n = size of sample taken from each population 


UNIVARIATE NORMAL DISTRIBUTIONS 
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A = distance between the means of the two populations 


Probability of error = P (Z is assigned to G|Z came from F) 


= P (Z is assigned to F 





Z came from G) 
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FIGURE 1 


Probability of error PS of the linear discriminant 
function for two univariate normal distributions with 


distance between means = P 


n = size of sample from each population, 


Li 
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FIGURE 2 


Probability of error Ps of the Linear discriminant 
function for two univariate normal distributions with 
distance between the ineans = , plotted as a function of 

oe 


n = size of sample from each population, 


A Re: 





v' tT 'dedesd 





It remains then to calculate P,(z). Define 
H.( 6) = Plix = 2] <6) 6 >0 
=P(z -§<x<a2+ ) 
O(2 +O) - b(2 -6), 
Piiye zl< & ) 
=P(z-) -§<y-Jez-} +d) 
=O(z- +6) -O0(z2-) -6). 


The event, "the nearest sample value to z is a y" can 


ii 


fi 


and K,(O) 


be classified into the n exclusive equiprobable events, "the 
nearest sampls value to z is V4 Joes Seo Mo oance the 
nearest y to 4% will necessarily be the minimum y, it is 
necessary to compute the probability density function for the 
minimum of |X, -2 | ; | ¥5~2 ee poe » Since the 

| ¥,-2| » L = 1,2, ooo, Ny, are independent identically dis= 
tributed random variables, this density function is easily 
shown to be, 

n(l - K,(6) )?*aK, (6), 
Pi (2) is then computed by the following formula: 
Py(zyen f (2 = (6 ))PO = 061)" aK, (6) (2). 

Formulae (1) and (2) form the basis for all the computations 
for the "nearest neighbor rule" no matter what the value of 
p if for p>1 one replaces P/|x - 2|<() by P (the distance 
of X from 2 < ( ) and similarly P(|Y = z2|< 6 ) by P (the 
distance of Y from z uO ).o Of course the specific evalu- 


ations depend upon the distance function used, 


15 





xcept for ths case p=l, n=l, in which case aa and ae 
are the same for the Linear discriminant function and the 
nonparametric discriminator, the bulk of the computations 
for the nonparametric discriminator were carried out by 
straightforward numerical integration. These computations 
are given in Table 2. These computations are quite heavy, 
especially for the cass p=2. Therefore, a search for an 
approximation fsrmula for the computation of P, (2) was 
instituted, One approximation formula was found which gave 
very good results. A discussion of this approximation 


formula is given in (4), P, as computed using the approxi- 


L 
mation formula for P, (2) is tabled in Table 2-A, One very 
interesting result which was obtained using the approxi- 


mation formula for P, (2) was that for large n, 


OO 

© Hele] "| fie 
An application of Schwartz's iadae) cee shows the latter 
integral to be at most 0.5. It is thus possible to assert 
that, whatever be the populations being discriminated, the 
"rule of the nearest neighbor” will have in the limit as 
m =n—-OO equal probabilities of error at most 0.5. 

To compare the figures of Tables 1, 2, and 2-A, the 
values of P, = P, for paired values of ) are plotted against 


nin Figure 3. In Figure 4, the same values are plotted 


against \ for selected values of n. 


Ly 





TABLE 2 
PROBABILITY OF ERROR, NONPARAMETRIC DISCRIMINATOFP 


WITH k=1, UNIVARTATE NORMAL DISTRIBUTION 


n A = A =2 A #3 
1 whi 75 ye Pi é ee 
2 4086 9236), 108), 
3 HO052 02307 » L036 
by 032 » 2280 LO1Ly 


TABLE 2-A 


APPROXIMATE PROBABILITY OF ERROR, NONPARAMETRIC 


DISCRIMINATOR WITH k=1, UNIVARIATE NORMAL DISTRIBUTION 





n= size of sample from each population 

A = distance between the means of the two populations 
Probability of error = P(Z is assigned *a G| Z came from F) 
= P(Z is assigned to F| Z came from G)} 
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Comparison of ths probability of error zy as a function of 


n for the linear discriminant function and the nonparametric 


discriminator, distance function /\ , k=l, for two normal 


univariate populations with distance between means = ) . 


n = size of sample from ¢€ach population 
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FIGURE 


Comparigzon of ths probability of error PL as a function of 
A » the distance between the means, for the linear dis- 
cYriminant function and the nonparametric discriminator, 
distance function = /\, k=1, for two normal univariate 
populations 

n= size of sample from éach population 


Lis identical for both 


i 


n 


--=- indicates the nonparametric procedure 
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Not discussed in this paper, but investigated to a very 


limited extent in (4) are ths following cases: 


(i) 


‘az) 


(ae) 


the nonparametric discriminator using (A asa 
distanes function with k £ 3 for the univariate 

and bivariate normal distributions 

the nonparametric discriminator using / as a 
distances function k = 1, n= 1 for p 4 2 

the effect of distance functions other than / on 
the probabilities of misclassification for bivariate 


normal distribution 


Although the investigation of the above cases was ex= 


tremely limited dus to the laborious computations, the 


results that were obtained indicated that the nonparametric 


discrimination procedure gave "reasonable" error probabili- 


ties in both cases (1) and (ii). In the bivariate normal 


distribution, different distance functions produced vastly 


different error probabilities in some situations. 


18 





SECTION ITT 
PERFORMANCE OF THE LINEAR DISCRIMINANT FUNCTION 
AND A CLASS OF NONPARAMETRIC DISCRIMINATORS 


WHEN F AND G ARB EXPONENTIALLY DISTRIBUTED 


In this section, a limited inves*igation of the linear 
discriminant function and thes nonparametric discriminator 
using /\ as a Gistance function and using "the rule of the 
nearest neighbor, " k=1, is made when F and G are not normally 
distributed; but in fact, exponentially distributed with 
paraneters A and jL respectively. The performance of 
both the linear discriminant function and the nonparametric 
discriminator will be investigated again by computing the 
probabilities of misclassification. Under the assumption 
that F and G are éxponentially distributed, it will be shown 
that the linear discriminant function and the nonparametric 
discriminator using /\ as a distance function and "the rule 
of the nearest neighbor” can give high probabilities of 
misclassification. 

Throughout the rsmainder of the section, it will be 
assumed that m =n snd that F and G are exponentially dis= 
tributed with parameters He and {[{f respectively. Because 
of the heavy computations involved in computing the probabili- 
ties of misclassification; 

(i) P, = P (assigning Z to G|Z came from F) 


(ii) Ps = P (assigning Z to F | Z came from G) 


iy 





the only cases investigated will be for p=l and n=l,2, 

Pi and P, wili firet be computed for the linear dis- 
eriminant function. The procedure here is precisely that 
which was used in Section II for p =1. One simply computed 
the arithmetic mean X + Y of the sample means and assigns 


Z to that population whose sample mean lies on the side of 


it is only neces-= 


X + / as does z itself. While P ~ P., 
ae all 1 2 
sary to compute P, since P., can readily be computed from Pi 


by interchanging ) and [j . 

Proceeding as in Section II, define the new variables 
U=YeXandv=X+Y- 2z%, If U and V are to be inde- 
pendent, it is necessary that the covariance of U and V be 


zero. Computing the covariance of U and V we have: 


1 : 
Cov(U,V) = ~ {| —_ ~# 0 except for rR = ela, 


a 
19 ee de LL* 
Since discrimination is not possible for = [Lf , the 
Cov(U,V) will not be zero and in general U and V will not be 
independent. As befors, sn error is committed by linear dis- 
criminant function if and only if; 


GZS +Y and Y>X 


oa 


Xx 
(452 2 a ind ae 
ee” 


In terms of the variables U and V, an error is committed if 
and only if UV < 0, and theérefors, 


P, = P (UV < 0). 





Since U and V are not in general indspendent, the probability 
that UV<0O is not easily computed. Tt is necessary to com- 
pute the joint. densitv functicn for U and V and integrate 
over the region where UV < 0. The joint density function 

of U and V was computed but because of the complex naturs of 
this function, it was considered easier to compute PL di - 
rectly. By (i) and (ii) and the definition of 7 Lt folluws 


that, 





P, SP as Kee YOK) Pee 
| a C. 


Let T = nY¥ and & ~ nX and thus, 
f_, is the gamma density function with parameters n and/[l 


ff. is the gamma density function with parameters n and X, 


S 
Since T, S, and Z are independent random variables, 
me EE Ow (2) F(t) T. (8s) az at a 
= ; | t g ; as 
Ff leer eer 
i 2 “2n- 
oS .- Ee: 
+ | Ss Peng Pe Ute fo. (ade dt ds 
s i, @ Ls ‘a s | ; 


PL can now be computed by direct numerical integration. For 


n=l, a as a function of A. ana Laas 


4 Oe [Lh ) = Mio A+ 2 { + 15 LL) - 
: ci ee ee) ra 


By interchanging Xr and ji» P.5 is, 
P,(A,M =P) Cie dd 





2h 





Recognizing that the numerator and danominator in the ex 
pressions for ' and > are homogeneous of degree 3 in A 
ead {1 P. and P, can be expressed in terms of a single 
parameter oc by setting A * ¢/{ . Making this substitution 


in the expressions for P, and : we have, 


oa te. (10 «e — 
1 3 F271 TE Tat 


aa deamee 


For n=l, Ps and P. for the linear discriminate function are 
fu. 


the same as Py and P., for the nonparanetric discriminator 
using A as a distancs function and "the nearest neighbor 
rule, k=l." 

For n=2, the subssitution A =¢ 1 is again approprie- 
ate and Ps and P., for n=e are as follows, 

Pile) a ees es ahem ae 2 Clee 


(c + hy {3c + 2), Moras ne PAB Sica 2) 


fo) = yd 


Values of Py and P for the Linsar discriminant function 


for n=1 and 2 are tabled for various values of c in Table 3, 
Po and P.., are next computed for the nonparametric dise 
criminator for the case n=2. The procedure used is exactly 
the procedure used in Section II. The substitution ) =cLL, 
is once more appropriate. P, and P, in terms of a single 


parameter c are az follcws:; 
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(c= 1)(2c +1) . 
(3c + 8) 1 for c # 2 
+ $85 a cue) : 
P_(c) = (30¢% = 38¢ = 112) , (32 + 2he = 56c° = 12c3) 
° iste + 2)(e = 1) 
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, (203 + 1c? ~ 2c) (Alo = le = tag) 
(50 + 2) Ste + Ayte - 1) 
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Values of P, and P., for various values of c for the non-= 


parametric discriminator with n=e are given in Table 3. 

It is observed in Table 3, that P, and P, exceed 0.5 
for numérous values of cc. Because of this observation, an 
investigation was made to determine the values of ec for 


which P, and P, exceed 0.5. Figure 5 diaplays graphically 


L 2 
the regions in the, , L plane where PL and P., are greater 
than 0.5. 


Figure 5, points out only tos well that great caution 
Should be used when applying the Linear discriminant in 
situations when the populations are other than nornal, 
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TABLE 3 
PROBABILITIES OF ERROR, UNIVARIATE 


EXPONENTIAL DISTRIBUTIONS 
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Linear 
Discriminant HhO000 .3262 2.271 .2360 .1385 
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Ponettonea n=! Pa 5000 .5333 521k .5037 4870 .4329 


Linear 
Discriminant 05000 3736 .2652 .2009 .1567 .0627 
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¢ is a parameter such that ) = cp 
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AI is the parameter of the F population 


jL is the parameter of the G population 
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eal 


P, = P (assigning Z to F |Z came from G) 


P (assigning Z to G/|Z came from F) 


ti 


n = sample size 


1 and P., for the linear 


discriminant function are equal to the corresponding probabili- 


“For n=l, the probabilities of error P 


ties of error P. and P, for the nonparametric discriminator, 
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Values of a and {I for which Pl and P, exceed 0.5 


parameter such that A =e LL 


r 


[1 = parameter of G distribution 


parameter of F distribution 


t] 


1 P (Z is assigned to G |Z came from F) 


ir 
gees (Z is assigned to F |Z came from G) 


nm = gample size 


“Linear discriminant function is equivalent to the none- 


parametric discriminator for n = lL. 


25 





SECTION IV 
SUMMARY AND CONCLUSIONS 


In any discrimination problem one has a choice Letween 
using parametric or nonparametric procedures, This choice 
in general will depend upon three factors: 

(i) the strength of the users belief in his parametric 

model. 

(ii) the loss that would be suffered by using the none 

parametric rule if in fact the parametric form is 
COmrecs. 

(iii) the loss that would be suffered by using the 
parametric rule if the actual densities depart 
from the parametric form assumed, 

For the two population discrimination problem, Section 
[i of this paper concerned itself with-(i1). Im Section fi, 
it was assumed that the two populations being discriminated 
were normal with equal covariance matrices, For the univarie 
ate case, the parametric procedure used was the well known 
Linear discriminant function which is known to be optimal 
in this situation. The nonparametric procedure used was the 
rule whereby a random variable was classified as belonging 
to the population which had the nearest observation to an 
observed value of the random variable being classified. A 
comparison of these two procedures was made by computing and 


comparing the probabilities of misclassification. 
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Also for the two population discrimination problem, an 
investigation of the linear discriminant function and the 
same nonparametric procedure was carried out when the two 
populations were not normal but exponential. Again the ine 
vestigation was made by computing the probabilities of mise= 
classification for both procedures. This investigation was 
made in Section III of this paper. Because of the lengthy 
computations involved in computing the probabilities of error 
for both of these procedures, the only cases considered were 
the univariate case for sample sizes of 1 and 2 It was 
shown that for the two cases investigated, sample sizes of 
1 and 2, that both the procedures could give poor results 
depending on the parameters of the distributions, 

In conclusion, it seems reasonable that if the popue] 
lations to be discriminated are well known, and have been 
investigated to be such that the normal distribution gives 
a good fit and that the variance and correlation do not 
change much when the means are changed, and if the classifi- 
cation to be made warrants the labor of matrix inversion, 
then the linear discriminant function should be used. How=- 
ever, if the populations are either not well knowns or are 
known not to be approximately normal or to have very differe= 
ent covariance matrices; or if the discrimination is such 
that small decreases in probability of error are not worth 


extensive computations, then a nonparametric procedure seems 


ee 





to be advisable. Which nonparametric procedure is a matter 


Of Choree for the user; 


Recommendations to be made on the basis of this paper 


2 id Gg 


(i) 


(ii) 


oie lp 


tabulate the probabilities of error for the linear 
discriminant function in representative situations 
for the case where the populations being discrimi- 
nated are multivariate normal with equal covarie 
ance matrices, 

further investigation (for larger sample sizes) 

of the linear discriminant function in the case 
where the populations being discriminated are exe= 
ponential because of the importance of the 
exponential distribution in the field of life 
testing and other applied problems. 

investigation as to the effect of other distance 
functions for the nonparametric discriminator dis- 
cussed in this paper in the case when the popue 
lations being discriminated are exponential or 


some other class of distributions, 
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